Representing masked bits in rings

7 March 2024 (updated 7 March 2024)

$$ \def\vec#1{\mathbf{#1}} \gdef\mat#1{\mathrm{#1}} \def\T{\mathsf{T}} \def\F{\mathsf{F}} \def\U{\mathsf{U}} \def\popcount{\mathtt{popcount}} \def\count{\mathtt{count}} \def\hamming{\mathtt{hamming}} \def\fhd{\mathtt{fhd}} \def\vsum{\mathtt{sum}} $$

In cryptography it is often useful to represent bits in algebraic objects. I will specifically consider rings $ℤ_m$ of integers modulo $m$. If $m$ is a prime number this happens to also be a field, but the difference is not very important here.

Bits are elements of $𝔹 = \{\T, \F\}$, i.e. true and false, with the usual operations of not $¬$, and $∧$, or $∨$ and xor $⊕$.

Given bit vectors $\vec a, \vec b ∈ 𝔹^n$ let the $\popcount: 𝔹^n → [0,n]$ function return the number of $\T$ values in a vector. We can then define the hamming distance function as $\hamming(\vec a, \vec b) = \popcount(\vec a ⊕ \vec b)$.

Masked bits are elements of $𝕋 = \{\T, \F, \U\}$, i.e. true, false, and unavailable. The operations are the same as for binary, except when one of the arguments is $\U$, the result is always $\U$.

We take $\popcount$ to count the number of $\T$'s and introduce $\count$ to count the number of available entries i.e. either $\F$ or $\T$ but not $\U$. We can then define a fractional hamming distance $\fhd$ as

$$ \fhd(\vec a, \vec b) = \frac{\popcount(\vec a ⊕ \vec b)}{\count(\vec a ⊕ \vec b)} $$

The $\{0,1\}$ representation of bits

We represent $\F, \T$ as $0,1$ respectively. The usual binary operations of not $¬$, and $∧$, or $∨$, xor $⊕$, can respectively be represented using ring operations as

$$ \begin{aligned} ¬ a &= 1 - a \\ a ∧ b &= a ⋅ b \\ a ∨ b &= a + b - a ⋅ b \\ a ⊕ b &= a + b - 2 ⋅ a ⋅ b \\ \end{aligned} $$

If we have a vector $\vec a ∈ 𝔹^n$ we can represent it element wise in $ℤ_m^n$. This allows us to represent the $\popcount$ function:

$$ \begin{aligned} \popcount(\vec a) &= \vsum(\vec a) \mod m\\ \end{aligned} $$

Observing that the sum over elementwise products is the dot product, we obtain the following useful identity:

$$ \popcount(\vec a ∧ \vec b) = \vec a ⋅ \vec b \mod m $$

This representation is reversible for any $m ≥ 2$, except for $\popcount$ which need $m ≥ n$.

$$ \hamming(\vec a, \vec b) = \vec a ⋅ \vec a + \vec b ⋅ \vec b - 2 ⋅ \vec a ⋅ \vec b $$

The $\{-1, 1\}$ representation of bits

We represent $\F, \T$ as $-1,1$ respectively. The usual binary operations of not $¬$, and $∧$, or $∨$, xor $⊕$, can respectively be represented using ring operations as

$$ \begin{aligned} ¬ a &= - a \\ a ∧ b &= ? \\ a ∨ b &= ? \\ a ⊕ b &= - a ⋅ b \\ \end{aligned} $$

If we have a vector $\vec a ∈ 𝔹^n$ we can represent it element wise in $ℤ_m^n$. This allows us to represent the $\popcount$ function:

$$ \begin{aligned} \popcount(\vec a) &= ½ ⋅(n - \vsum(\vec a)) \\ \end{aligned} $$

Given a $m$ bit vectors $\vec a_i ∈ 𝔹^n$ we can construct a matrix $\mat A ∈ 𝔹^{n×m}$ such tht $\mat A_{i,:} = \vec a_i$. If we have a second matrix of hamming distances $\mat D ∈ [0,n]^{m×m}$ such that $\mat D_{ij} = \hamming(\vec a_i, \vec a_j)$ then the following holds:

$$ \mat D = ½ ⋅(n⋅\mat J - \mat A ⋅ \mat A^\T) $$

Where $\mat J$ is a matrix of all ones. Suppose we are given $\mat D$, we can compute $\mat D' = n⋅\mat J - 2⋅\mat D$ and solve the following quadratic system:

$$ \begin{aligned} \mat D' &= \mat A ⋅ \mat A^\T & \mat J &= \mat A ⊙ \mat A \end{aligned} $$

The $\{-1,0,1\}$ representation of masked bits

We represent $\F, \U, \T$ as $-1, 0, 1$ respectively. The binary operations become

$$ \begin{aligned} ¬ a &= - a \\ a ∧ b &= ½ ⋅ a ⋅ b ⋅ (1 + a + b - a ⋅ b) \\ a ∨ b &= ½ ⋅ a ⋅ b ⋅ (a ⋅ b + a + b -1 ) \\ a ⊕ b &= -a ⋅ b \\ \end{aligned} $$

Note that the formulas for $∧, ∨$ can be evaluated using two multiplies in a depth-2 circuit.

Given a masked bitvector $\vec a ∈ 𝕋^n$, we can define

$$ \begin{aligned} \count(\vec a) &= \vsum\left(\vec a^{⊙2}\right) \\ \popcount(\vec a) &= ½ ⋅ \vsum\left(\vec a^{⊙2} + \vec a\right) \end{aligned} $$

We can represent this on a suitably large (Q how large?) ring as $-1,0,1$ for $\F, \U, \T$ respectively.

Note that squaring, $\vec a^{⊙2}$, produces a regular $0,1$ bitvector that is $0$ whenever the value is $\U$ and $1$ otherwise, i.e. it allows us to extract the mask as a regular bitvector. Similarly $½ ⋅(\vec a^{⊙2} + \vec a)$ extracts the data bits: $1$ for $\T$ and $0$ otherwise. The reverse mapping, converting data bits $\vec b$ and mask bits $\vec m$ to a masked bitvector is $\vec m - 2⋅\vec b⊙\vec m$. If the data bits are known to be $\F$ in the in the unavailable region, then it simplifies to $\vec m - 2⋅\vec b⊙\vec m$.

In this representation and and or have awkward fourth degree expressions, though they can be evaluated using only two multiplies. The expressions for xor, $\count$ and $\popcount$ are quite nice considering that they correctly account for masks. This makes it a suitable system for computing fractional hamming distances.

This representation is reversible for any $m ≥ 2$, except for $\popcount$ which need $m ≥ n$.

Fractional hamming distance

The fractional hamming weight of a masked bitvector $\vec a$ is defined as

$$ \begin{aligned} \mathtt{fhw}(\vec a) &= \frac {\popcount(\vec a)} {\count(\vec a )} \end{aligned} $$

and the fractional hamming distance between two masked bitvectors $\vec a$ and $\vec b$ as

$$ \begin{aligned} \fhd(\vec a, \vec b) &= \mathtt{fhw}(\vec a ⊕ \vec b) = \frac {\popcount(\vec a ⊕ \vec b)} {\count(\vec a ⊕ \vec b)} \end{aligned} $$

In the ring representation these can be computed as

$$ \begin{aligned} \fhd(\vec a, \vec b) & =\frac { ½ ⋅ \vsum\left((-\vec a ⊙ \vec b)^{⊙2} -\vec a ⊙ \vec b\right)} {\vsum\left((-\vec a ⊙ \vec b)^{⊙2}\right)} &&=\frac{1}{2} -\frac { \vsum\left(\vec a ⊙ \vec b\right)} {2⋅\vsum\left((\vec a ⊙ \vec b)^{⊙2}\right)} \\ \end{aligned} $$

Note that $(\vec a ⊙ \vec b)^{⊙2} = \vec a^{⊙2} ⊙ \vec b^{⊙2}$ and thus depends only on the masks of $\vec a$ and $\vec b$. Given the masks $\vec a_m$ and $\vec b_m$ it can be computed in binary as

$$ \vsum\left((\vec a ⊙ \vec b)^{⊙2}\right) = \popcount(\vec a_m ∧ \vec b_m) $$